MDL-Based Unsupervised Attribute Ranking

نویسنده

  • Zdravko Markov
چکیده

In the present paper we propose an unsupervised attribute ranking method based on evaluating the quality of clustering that each attribute produces by partitioning the data into subsets according to its values. We use the Minimum Description Length (MDL) principle to evaluate the quality of clustering and describe an algorithm for attribute ranking and a related clustering algorithm. Both algorithms are empirically evaluated on benchmark data sets. The experiments show that the MDL-based ranking performs closely to the supervised information gain ranking and thus improves the performance of the EM and k-means clustering algorithms in purely unsupervised setting.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fast Unsupervised Automobile Insurance Fraud Detection Based on Spectral Ranking of Anomalies

Collecting insurance fraud samples is costly and if performed manually is very time consuming. This issue suggests usage of unsupervised models. One of the accurate methods in this regards is Spectral Ranking of Anomalies (SRA) that is shown to work better than other methods for auto insurance fraud detection specifically. However, this approach is not scalable to large samples and is not appro...

متن کامل

A New Balancing and Ranking Method based on Hesitant Fuzzy Sets for Solving Decision-making Problems under Uncertainty

The purpose of this paper is to extend a new balancing and ranking method to handle uncertainty for a multiple attribute analysis under a hesitant fuzzy environment. The presented hesitant fuzzy balancing and ranking (HF-BR) method does not require attributes’ weights through the process of multiple attribute decision making (MADM) under hesitant conditions. For the rating of possible alternati...

متن کامل

An improved MDL-based compression algorithm for unsupervised word segmentation

We study the mathematical properties of a recently proposed MDL-based unsupervised word segmentation algorithm, called regularized compression. Our analysis shows that its objective function can be efficiently approximated using the negative empirical pointwise mutual information. The proposed extension improves the baseline performance in both efficiency and accuracy on a standard benchmark.

متن کامل

Unsupervised Progressive Parsing of Poisson Fields Using Minimum Description Length, Criteria

This paper describes novel methods for estimating piecewise homogeneous Poisson elds based on minimum description length (MDL) criteria. By adopting a coding-theoretic approach, our methods are able to adapt to the the observed eld in an unsupervised manner. We present a parsing scheme based on xed multiscale trees (binary, for 1D, quad, for 2D) and an adaptive recursive partioning algorithm, b...

متن کامل

Unsupervised Segmentation of Poisson Data

This paper describes a new approach to the analysis of Poisson point processes, in time (1D) or space (2D), which is based on the minimum description length (MDL) framework. Specifically, we describe a fully unsupervised recursive segmentation algorithm for 1D and 2D observations. Experiments illustrate the good performance of the proposed methods.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013